
Pyrel Mong
|
Posted - 2008.09.02 20:45:00 -
[1]
Originally by: Greg Vaganza Hello all fans and freaks of and in EVE, I personally know that deploying a patch or an extension for a software product with the complexity of something like EVE-Online can be very hard if something fails. So to all the people who are now angry, please be sure that everybody in Island and/or the USA who is part of this project, is, at this moment, giving his or her best to get it running.
When a company like CCP is deploying a patch they will have made a bunch of tests to ensure its complete functionality. Otherwise they will have changed the patch day. But even when everybody in a developer team is sure that everything works fine and after the patch or during the deployment something fails, you feel yourself like an idiot, and ask yourself what has gone wrong. In the case we have here there really can be a single char which makes the deployment fail. And there can appear very very complex connections between code and data which make it fail. Maybe the database now has incosistent entries where the developers have to develop tools or database-views to find out what went wrong. Or at least a million lines of code have to be debugged to find out where exactly the error appears. And after that the error has to be reproduced to understand what the reason is. And after the bug is fixed, there have new tests to be made to ensure that no other bug has been produced througout the "fix". This takes time, because we are all only human.
So please imagine CCP now, where (more or less) all are running around like chickens, asking stupid questions and not beeing sure of what they've done the last weeks, returning to their seats and looking uncomprehending on their code.
So, from me to the developers, good luck. You are doing a good job, just try to stay cool, perhaps start something like the "extreme programming"-theory because four eyes are better than two (I know why...) . Best wishes! Greg
One can avoid downtime by having a second/backup/redundant/auxilirary set of computers for when you get hardware trouble with your primary systems (Which is good for reliability anyway). One then takes a complete backup of all systems to secondary media, be it dvd, blueray, NAS I don't care. And then deploys the patch to the backup cluster, does testing and makes sure it's working as advertised, using a copy of the database that we just took a backup of. Once everything is good, announce in game that "We're shutting servers down in 30min so get your ass to a station", and once the server is down, quickly transfer the entire database from the live server to the backup cluster and reroute trafic to the backup cluster by unplugging the primary, the redundancy will kick in and traffic will flow to the backup cluster with the patch already applied. Now apply patch to primary cluster while backup serves the hungry crowd. Switch back to primary system at a comfortable time, like next maintenance.
Total downtime? About the time it takes to swap databases, if you have properly set stuff up, it's just a matter of changing an IP address. So like 5min with the restarting?
Evely body happy! |